The data used in this project is from the Analyze Boston Property Assessment series.
After completing the Data Preparation detailed below, our data set has records for 30,082 single family dwellings in Boston. These include property information as of FY2021 and total assessment values from FY2015 to FY2021.
Note: Initial steps are handled in CS544_BostonProperties_ETL.R
TOTAL_VALUE in FY2021 data from string ($300,000.00) to numeric format (300000).PID, PTYPE (equivalent to LUC in FY2021 data), and AV_TOTAL (equivalent to TOTAL_VALUE in FY2021 data) from the following:
LUC==101 (FY2021) or PTYPE==101 (all other years).PID in FY2015-FY2017, remove trailing characters and change from string (0100003000_) to numeric (100003000).TOTAL_VALUE (FY2021) or AV_TOTAL (all other years) to VALUE_{YEAR}.VALUE_{YEAR} from all other years onto FY2021 data on PID.AMT_CHANGE and PCT_CHANGE to show how much total assessment values changed from 2015 to 2021.| Feature | Description |
|---|---|
PID |
Unique 10-digit parcel number. First 2 digits are the ward, digits 3 to 7 are the parcel, and digits 8 to 10 are the sub-parcel. |
CITY |
City of parcel. |
ZIPCODE |
Zip code of parcel. |
LUC |
State class code. Indicates type of property. |
OWN_OCC |
One-character code indicating if owner receives residential exemption as an owner-occupied property. |
LIVING_AREA |
Living area square footage of the property. |
YR_BUILT |
Year property was built. |
EXT_COND |
Exterior condition. |
BED_RMS |
Total number of bedrooms in the structure. |
FULL_BTH |
Total number of full baths in the structure. |
VALUE_{YEAR} |
Total assessed value for property for the given fiscal year. Originally named AV_TOTAL in data sets for FY2015-2020 and TOTAL_VALUE for FY2021. |
AMT_CHANGE |
Amount change in total assessed value between FY2015 and FY2021. (Derived.) |
PCT_CHANGE |
Percent change in total assessed value between FY2015 and FY2021. (Derived.) |
Click on the tabs to view associated charts.
The majority of single family dwellings in Boston are owner-occupied as of FY2021.
Most single family dwellings have 1 or 2 full bathrooms, though there are those with more. This includes an outlier listed as having 12 full bathrooms.
Overall, it is easy to see that single family dwellings in the city of Boston have the highest median total assessment values and the greatest change in dollar amount from FY2015 to FY2021.
The upper fence of values in FY2021 for the city of Boston is $7.7M - greater than any outliers for any other cities. Likewise, the upper fence for change in amount from 2015 to 2021 is $2.5M - again, greater than any outliers for any other cities.
However, it is worth noting that when considering the percent change between the years, the median percent change is actually greater for other cities.
---
title: "Analysis of Boston Properties"
author: "Dawn Graham, Sylvie Xiang"
output:
flexdashboard::flex_dashboard:
vertical_layout: fill
source_code: embed
# theme:
# version: 4
# bootswatch: journal
---
Overview
=====================================
Column {data-width=600}
-------------------------------------
### About the Data Set {data-height=90}
The data used in this project is from the [Analyze Boston Property Assessment](https://data.boston.gov/dataset/property-assessment){target="_blank"} series.
After completing the Data Preparation detailed below, our data set has records for **30,082 single family dwellings in Boston**. These include property information as of FY2021 and total assessment values from FY2015 to FY2021.
### Data Preparation
```{r}
library(plotly)
library(plyr)
## Read in data
pa <- read.csv("https://raw.githubusercontent.com/dawngraham/cs544-boston-properties/main/pa.csv",colClasses=c("ZIPCODE"="character"))
# Remove duplicate rows
pa <- pa[!duplicated(pa),]
# Derive change in values from 2015 to 2021
pa$AMT_CHANGE <- pa$VALUE_2021 - pa$VALUE_2015
pa$PCT_CHANGE <- round(pa$AMT_CHANGE / pa$VALUE_2015 * 100)
```
Note: Initial steps are handled in [CS544_BostonProperties_ETL.R](https://github.com/dawngraham/cs544-boston-properties/blob/main/CS544_BostonProperties_ETL.R){target="_blank"}
- Get 11 features from [Property Assessment FY2021](https://data.boston.gov/dataset/property-assessment/resource/c4b7331e-e213-45a5-adda-052e4dd31d41){target="_blank"}. See Code Book on this page.
- Convert `TOTAL_VALUE` in FY2021 data from string (`$300,000.00`) to numeric format (`300000`).
- Get `PID`, `PTYPE` (equivalent to `LUC` in FY2021 data), and `AV_TOTAL` (equivalent to `TOTAL_VALUE` in FY2021 data) from the following:
- [Property Assessment FY2020](https://data.boston.gov/dataset/property-assessment/resource/8de4e3a0-c1d2-47cb-8202-98b9cbe3bd04){target="_blank"}
- [Property Assessment FY2019](https://data.boston.gov/dataset/property-assessment/resource/695a8596-5458-442b-a017-7cd72471aade){target="_blank"}
- [Property Assessment FY2018](https://data.boston.gov/dataset/property-assessment/resource/fd351943-c2c6-4630-992d-3f895360febd){target="_blank"}
- [Property Assessment FY2017](https://data.boston.gov/dataset/property-assessment/resource/062fc6fa-b5ff-4270-86cf-202225e40858){target="_blank"}
- [Property Assessment FY2016](https://data.boston.gov/dataset/property-assessment/resource/cecdf003-9348-4ddb-94e1-673b63940bb8){target="_blank"}
- [Property Assessment FY2015](https://data.boston.gov/dataset/property-assessment/resource/bdb17c2b-e9ab-44e4-a070-bf804a0e1a7f){target="_blank"}
- Limit to data about **single family dwellings** by getting only records where `LUC==101` (FY2021) or `PTYPE==101` (all other years).
- For `PID` in FY2015-FY2017, remove trailing characters and change from string (`0100003000_`) to numeric (`100003000`).
- Rename `TOTAL_VALUE` (FY2021) or `AV_TOTAL` (all other years) to `VALUE_{YEAR}`.
- Merge `VALUE_{YEAR}` from all other years onto FY2021 data on `PID`.
- Create derived features `AMT_CHANGE` and `PCT_CHANGE` to show how much total assessment values changed from 2015 to 2021.
- Remove complete duplicates.
Column {data-width=400}
-------------------------------------
### Code Book
| Feature | Description |
|---|-----------|
| `PID` | Unique 10-digit parcel number. First 2 digits are the ward, digits 3 to 7 are the parcel, and digits 8 to 10 are the sub-parcel.|
| `CITY` | City of parcel. |
| `ZIPCODE` | Zip code of parcel. |
| `LUC` | State class code. Indicates type of property. |
| `OWN_OCC` | One-character code indicating if owner receives residential exemption as an owner-occupied property. |
| `LIVING_AREA` | Living area square footage of the property. |
| `YR_BUILT` | Year property was built. |
| `EXT_COND` | Exterior condition. |
| `BED_RMS` | Total number of bedrooms in the structure. |
| `FULL_BTH` | Total number of full baths in the structure. |
| `VALUE_{YEAR}` | Total assessed value for property for the given fiscal year. Originally named `AV_TOTAL` in data sets for FY2015-2020 and `TOTAL_VALUE` for FY2021. |
| `AMT_CHANGE` | Amount change in total assessed value between FY2015 and FY2021. (Derived.) |
| `PCT_CHANGE` | Percent change in total assessed value between FY2015 and FY2021. (Derived.) |
Analysis {data-orientation=columns}
=====================================
Column
-------------------------------------
### Analysis
Click on the tabs to view associated charts.
#### Year Built
#### Owner Occupied
The majority of single family dwellings in Boston are owner-occupied as of FY2021.
#### Bedrooms
#### Full Bathrooms
Most single family dwellings have 1 or 2 full bathrooms, though there are those with more. This includes an outlier listed as having 12 full bathrooms.
Column {.tabset}
-------------------------------------
### Year Built
```{r}
```
### Owner Occupied
```{r}
# Categorical variable: OWN_OCC (Dawn)
pa$OWN_OCC <- mapvalues(pa$OWN_OCC,
from=c("Y", "N"),
to=c("Yes", "No"))
own_occ <- table(pa$OWN_OCC)
plot_ly(x = names(own_occ),
y = as.numeric(own_occ),
type = "bar"
) %>%
layout(title = "Owner Occupied",
xaxis = list(categoryorder = "category descending"),
yaxis = list(title = "Single Family Dwellings")
)
```
### Bedrooms
```{r}
```
### Full Bathrooms
```{r}
# Numerical variable: FULL_BTH (Dawn)
full_bth <- table(pa$FULL_BTH)
full_bth_names <- as.numeric(names(full_bth))
plot_ly(x = full_bth_names,
y = as.numeric(full_bth),
type = "bar"
)%>%
layout(title = "Full Bathrooms",
xaxis = list(title = "Bathrooms",
tickvals = seq(1:max(full_bth_names))),
yaxis = list(title = "Single Family Dwellings")
)
```
City & Total Value {data-orientation=rows}
=====================================
Row
-------------------------------------
### City & Total Assessment Values
Overall, it is easy to see that single family dwellings in the city of Boston have the highest median total assessment values and the greatest change in dollar amount from FY2015 to FY2021.
The upper fence of values in FY2021 for the city of Boston is $7.7M - greater than any outliers for any other cities. Likewise, the upper fence for change in amount from 2015 to 2021 is $2.5M - again, greater than any outliers for any other cities.
However, it is worth noting that when considering the percent change between the years, the median percent change is actually greater for other cities.
### 2021 Total Assessment Values
```{r}
# CITY & TOTAL_VALUE (Dawn)
fig <- plot_ly(pa, y = ~VALUE_2021, color = ~CITY, type = "box")
fig <- fig %>% layout(yaxis = list(title = ""))
fig
```
Row
-------------------------------------
### Total Assessment Value Change from 2015 to 2021 ($)
```{r}
fig <- plot_ly(pa, y = ~AMT_CHANGE, color = ~CITY, type = "box")
fig <- fig %>% layout(yaxis = list(title = "Change in Dollars"))
fig
```
### Total Assessment Value Change from 2015 to 2021 (%)
```{r}
fig <- plot_ly(pa, y = ~PCT_CHANGE, color = ~CITY, type = "box")
fig <- fig %>% layout(yaxis = list(title = "Change in Percentage"))
fig
```
Distribution {data-orientation=columns}
=====================================
Column
-------------------------------------
### Distribution
```{r}
```
Column
-------------------------------------
### 2015-2021 Total Assessment Values
```{r}
# TOTAL_VALUE (Dawn)
fig <- plot_ly(pa, y=pa$VALUE_2015, type = "box", name="2015")
fig <- fig %>% add_trace(pa, y=pa$VALUE_2016, name="2016")
fig <- fig %>% add_trace(pa, y=pa$VALUE_2017, name="2017")
fig <- fig %>% add_trace(pa, y=pa$VALUE_2018, name="2018")
fig <- fig %>% add_trace(pa, y=pa$VALUE_2019, name="2019")
fig <- fig %>% add_trace(pa, y=pa$VALUE_2020, name="2020")
fig <- fig %>% add_trace(pa, y=pa$VALUE_2021, name="2021")
fig
```
### 2021 Total Assessment Values
```{r}
fig <- plot_ly(pa, x = ~VALUE_2021, type = "histogram")
fig <- fig %>% layout(xaxis = list(title = ""))
fig
```
Central Limit Theorem {data-orientation=rows}
=====================================
Row {data-height=600}
-------------------------------------
### Chart 1
```{r}
```
Row {data-height=400}
-------------------------------------
### Chart 2
```{r}
```
### Chart 3
```{r}
```
Sampling {data-orientation=columns}
=====================================
Column
-------------------------------------
### Sampling
```{r}
```
Column
-------------------------------------
### Distribution of 2021 Total Assessment Values
```{r}
# (Dawn)
fig <- plot_ly(pa, x = ~VALUE_2021, type = "histogram", histnorm='probability')
fig <- fig %>% layout(xaxis = list(title = ""))
fig
```
### Distribution of 2021 Total Assessment Values with Simple Random Sampling
```{r}
```
### Distribution of 2021 Total Assessment Values with Systematic Sampling
```{r}
# Systematic (Dawn)
N <- nrow(pa)
n <- 50
k <- ceiling(N / n)
r <- sample(k, 1)
# select every kth item
s <- seq(r, by = k, length = n)
sample <- pa[s, ]
fig <- plot_ly(sample, x = ~VALUE_2021, type = "histogram", histnorm='probability')
fig <- fig %>% layout(xaxis = list(title = ""))
fig
```